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APPARATUS AND METHOD FOR PACKET-BASED MEDIA COMMUNICATIONS 
FIELD OF THE INVENTION 

This invention relates generally to packet-based 
media communications and more specifically to media 
5 conferencing within a packet-based communication network. 

BACKGROUND OF THE INVENTION 

Prior to the use of packet-based voice 
communications, telephone conferences were a service option 
available within standard non-packet-based telephone networks 
10 such as Pulse Code Modulation (PCM) telephone networks ♦ As 
depicted in FIGURE 1A, a standard telephone switch 15 is 
dj coupled to a plurality of telephone terminals 16 to be 
= ~ included within a conference session as well as a conference 
O bridge 17, It is noted that these telephone terminals 16 are 
Il?15 coupled to the telephone switch 15 via numerous other 
=11 telephone switches (not shown) , The telephone switch 15 

forwards any voice communications received from the terminals 
W 16 to the conference bridge 17, which then utilizes a 
In standard algorithm to control the conference session. 

□20 One such algorithm used to control a conference 

session, referred to as a *party line" approach, comprises 
the steps of mixing the voice communications received from 
each telephone terminal 16 within the conference session and 
further distributing the result to each of the telephone 
25 terminals 16 for broadcasting. A problem with this algorithm 
is the amount of noise that is combined during the mixing 
step, this noise comprising a background noise source 
corresponding to each of the telephone terminals 16 within 
the conference session. 
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An improved algorithm for controlling a conference 
session is disclosed within U.S. patent application 08/987216 
entitled "Method of Providing Conferencing in Telephony" by 
Dal Farra et al, filed on December 9, 1997, assigned to the 
5 assignee of the present invention, and herein incorporated by 
reference. This algorithm comprises the steps of selecting 
primary and secondary talkers, mixing the voice 
communications from these two talkers and forwarding the 
result of the mixing to all the participants within the 
10 conference session except for the primary and secondary 

talkers* The primary and secondary talkers receive the voice 
communications corresponding to the secondary and primary 
talkers respectively. The selection and mixing of only two 
=0 talkers at any one time can reduce the background noise level 
i"2l5 within the conference session when compared to the ''party 
O line" approach described above, 

J"! In a standard PCM telephone network as is depicted 

in FIGURE 1A, all of the voice communications are in PCM 
:=H format when being received at the conference bridge 17 and 
FU20 when being sent to the individual telephone terminals 16. 
% Hence, in this situation, the mixing of the voice 

□ communications corresponding to the primary and secondary 

talkers is relatively simple with no conversions of format 
required. 

25 Currently, packet-based voice communications are 

being utilized more frequently as Voice-over-Internet 
Protocol (VoIP) becomes increasingly popular. In these 
standard VoIP communications, voice data in PCM form is being 
encapsulated with a header and footer to form voice data 

30 packets; the header in these packets has, among other things, 
a Real Time Protocol (RTP) header that contains a time stamp 
corresponding to when the packet was generated. One area 



Dec-29-00 01 :30pm From-StB/FiCo, 613-232-8440 T-686 P. 07 F-480 

t 

1 

-3- 

that requires considerable improvement is the use of packet- 
based voice communications to perform telephone conferencing 
capabilities . 

As depicted within FIGURE IB, a plurality of 
5 packet-based voice communication terminals, terminals A,B,C 
22,24,26 in this case, are coupled to a packet-based network 
20. Currently, in order for the users of these terminals 
22,24,26 to communicate within a voice conference, a packet- 
k ase d voice communication central bridge 28 must be coupled 
10 to the packet-based network 20. This conference bridge 28 

has a number of problems. These problems include the latency 
inherently created within the conference bridge 28, the 
Q considerable amount of signal processing power required, the 
S| cost of the conference bridge, the limited input/output 

■ill 5 capacity of the conference bridge, and the maintenance and 
p management of the conference bridge that is required- It 
fl should be noted that the high signalling power required is 

partially due to the conference bridge 28 having to 
^ compensate for a variety of problems that typically exist 

r|J20 within current packet-based networks. These problems include 

possible variable delays, out-of-sequence packets, lost 
O packets, and/or unbounded latency . 

FIGURE 2 is a logical block diagram of a well-known 
conference bridge design that could be implemented within the 
25 network of FIGURE IB. In this design, the conference bridge 
28 comprises an inputting apparatus 30, an energy detection, 
talker selection and mixing block 32 and an outputting 
apparatus 34. Typically all three of these blocks are 
implemented in software. 

30 The inputting apparatus 30 performs a number of 

functions on the packets char are received at the conference 
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bridge 28 from the rerminals within a voice conference. 
These functions include protocol stack, jitter buffer and 
decompression operations. During the protocol stack 
operation, the inputting apparatus 30 receives packets 
5 comprising compressed voice signals, hereinafter referred to 
as voice data packets, and strips off the packet overhead 
required for transmitting the voice data packets through the 
packet-based network 20. During the jitter buffer operation, 
the inputting apparatus 30 receives the compressed voice 
10 signals, ensures that the compressed voice signals are within 
rhe proper sequence (i.e. time ordering signals), buffers the 
compressed voice signals to ensure smooth playback and 
ideally implements packet loss concealment. During the 
it decompression operation, the inputting apparatus 30 receives 
'!%5 the buffered compressed voice signals, converts them into 
O standard PCM format and outputs the resulting voice signals 
l rf {that are in Pulse Code Modulation) to the energy detection, 
III talker selection and mixing block 32. 

The energy detection, talker selection and mixing 
Hto block 32 performs almost identical functionality to the 
& conference bridge 17 within FIGURE 1A. The key to the design 
n of a conference bridge 28 as depicted in FIGURE 2 is the 
inputting block 30 transforming the packet-based voice 
communications into PCM voice communications so the well- 
25 known conferencing algorithms can be utilized within the 
block 32. As described previously, in one conferencing 
algorithm, primary and secondary talkers are selected for 
transmission to the participants in the conference session ro 
reduce the background noise level from participants who are 
30 not talking and to simplify the mixing algorithm required. 
The selection of primary and secondary talkers is performed 
with an energy detection operation to determine the voice 
conference participants that are speaking, followed by a 
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talker selection operation to choose the primary and 
secondary talkers and a mixing operation to mix the voice 
communications received from the primary and secondary 
talkers- The resulting output from the block 32 is a voice 
5 communication consisting of a mix between the voice 

communications received from the primary and secondary 
talkers. Further outputs from the block 32 include the 
unmixed voice communications of the primary and secondary 
talkers that are to be forwarded, as described previously, to 
10 the secondary and primary talkers respectively. 

The outputting apparatus 34 performs a number of 
functions on the outputs from the block 32, these functions 
O including compression and transmission operations. During 
!f1 the compression operation, the outputting apparatus 34 
0115 receives and compresses respective ones of the three outputs 
S from the energy detection, talker selection and mixing block 

32, During the transmission operation, the outputting 
^ apparatus 34 performs a protocol stack operation on the 
H ; compressed voice signals, encapsulates the compressed voice 

hJ20 signals within the packet-based format required for 
*0 transmission on the packet-based network 20 and transmits 

p 5 voice data packets comprising the compressed voice signals to 

the appropriate terminals 22,24,26 within the conference 
session. It is noted that, in the case of the talker 
25 selection algorithm described above, the mixed voice signal 
is forwarded to all the terminals with the exception of the 
primary and secondary talkers while the primary and secondary 
talkers are sent the appropriate unmixed voice signals. 

One problem with the setup depicted within FIGURE 2 
30 is the degradation of the voice signals as the voice signals 
are converted from PCM format to compressed format and vice 
versa within the conference bridge 28, these conversions 
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together being referred to generally as transcoding. A 
further problem results from the considerable latency that 
the processing within the conference bridge 28. The latency 
of this processing can result in a significant delay between 
5 when the talker (s) speaks and when the other participants in 
the conference session hear the speech- This delay can be 
noticeable to the participants if it is beyond the perceived 
real-time limits of human hearing. This could result in 
participants talking while not realizing that another 
10 participant is speaking- Yet another key problem with the 
design depicted in FIGURE 2 is the considerable amount of 
signal processing power that is required to implement the 
conference bridge 28, As stated previously, each of the 
components shown within FIGURE 2 are normally simply software 
XL5 algorithms being run on DSP components (s) - This considerable 
ri amount of required signal processing power is expensive. 
O Even further, another key problem within current conference 
in bridge designs is their limited input/output capacity. This 

limited capacity is not always significant but could be 
h §20 exceeded in cases where there are large numbers of 

participants within the conference session. As well, a large 
number of participants within a conference session could put 
a strain on the capacity of the packet-based network 20 
itself due to the concentration of traffic that occurs with 
25 the use of packet-based conference bridges. 

Hence, a new design within a packet-based voice 
communication network is required to implement voice 
conferencing functionality. In this new design, a reduction 
in transcoding, latency and/or required signal processing 
30 power within the conferencing network is needed. 



SUMMARY OF THE INVENTION 
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The present invention is directed to methods and 
apparatus that can be utilized within a packet-based media 
communication system for media conferences. In one 
embodiment of the present invention, a packet-based 
5 conference bridge receives speech indication signals from the 
individual packet-based terminals within a voice conference, 
these speech indication signals being used to select the 
talkers within the voice conference. The speech indication 
signals could be a talking/listening indication, an energy 
10 level indication or another parameter that a talker selection 
algorithm could use to select packet-based terminals as 
talkers. In another embodiment of the present invention, the 
packet-based conference bridge sends addressing control 
;J signals to the individual packet-based terminals selected as 
i5 talkers. These addressing control signals indicate the 
ji: packet-based network addresses for all the packet-based 
O terminals that the talker should directly transmit its voice 
\ n data packets to. A yet other embodiment of the present 

invention combines the use of both of the above embodiments 
;2o such that the packet-based conference bridge essentially 
iti comprises a talker selection block that receives speech 
5 indication signals from packet-based terminals within a voice 
Q conference and transmits addressing control signals to the 

terminals that are selected as talkers in order to direct the 
25 voice data packets from the talker (s) to the appropriate 
other packet-based terminals within the voice conference. 

There are numerous advantages of the embodiments of 
the present invention compared to well-known voice 
conferencing techniques. For one, all of the embodiments of 
30 the present invenrion reduce the amount of processing power 
required within the conference bridges. This is done by 
removing the need for an energy detection block and/or an 
outputting apparatus within the conference bridge. This, in 
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turn, can reduce the latency for the voice data packets. 
Another advantage of some embodiments of the present 
invention is a reduced transcoding that must be done. This 
reduction could be caused by the reduced need to decompress 
5 the compressed voice signals within the conference bridge due 
to the independently received speech detection signals. 
Further, by transmitting voice data packets in some 
embodiments directly between the source of the voice data 
packets to the destination of the voice data packets, a 
10 significant reduction in transcoding can be achieved. Yet 

another advantage of embodiments of the present invention is 
the reduced concentration of traffic that results from the 
implementation of the combined embodiments- In this case, 
the conference bridge does not receive or transmit high 
bandwidth voice data packets, but rather receives and 
transmits control signals to manage the voice conference. 
O This also reduces any strain that might occur on the limited 
jJJJ input/output capacity for the conference bridge. 

N< The present invention, according to a first broad 

!J(0 aspect, is a conference bridge including an input unit, a 
=n talker selection unit and an output unit. The input unit 
:r{ operates to receive at least one media data packet from at 

least two sources forming a media conference, each media data 
packet defining a media signal. The talker selection unit 
25 operates to receive speech indication signals from at least 

one of the sources within the media conference and to process 
the speech indication signals including selecting a set of 
the sources within the media conference as talkers. The 
output unit operates to output the media signals that 
30 correspond to the set of sources within the media conference 
selected as talkers. 
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The present invention/ according to a second broad 
aspect, is a conference bridge including an input unit, an 
energy detection and talker selection unit and an output 
unit. The input unit operates to receive at least one media 
5 data packet from at least two sources forming a inedia 

conference, each media data packet defining a media signal. 
The energy detection and talker selection unit operates to 
determine at least one speech parameter corresponding to each 
of the media signals and select a set of the sources within 
10 the media conference as talkers based on the determined 
speech parameters • The output unit operates to output 
addressing control signals to the sources within the media 
conference selected as talkers. The addressing control 
y signals comprise instructions for the sources within the 
C|5 media conference selected as talkers to output their media 
jfj signals directly to other sources within the media 
q conference. 

111 The present invention, according to a third broad 

u aspect, is a conference bridge arranged to be coupled to a 
jJZO packet -based network that includes at least two sources of 
ijo media signals forming a media conference. In this aspect, 
y the conference bridge includes a talker selection unit 

similar to that of the first broad aspect and an output unit 

similar to the second broad aspect, 

25 According to a fourth broad aspect, the present 

invention is a packet-based apparatus arranged to be coupled 
to a conference bridge via a packet-based network. The 
packet-based apparatus including an output unit and a speech 
detection unit. The output unit operates to receive at least 

30 one media signal from at least one participant within a media 
conference and output the received media signal to the 
conference bridge via the packet-based network. The speech 
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detection unit operates to process the received media signal, 
generate a speech indication signal based upon the received 
media signal and output the speech indication signal to the 
conference bridge. 

5 According to a fifth broad aspect, the present 

invention is a packet-based apparatus arranged to be coupled 
to a conference bridge via a packet-based network, the 
apparatus including an addressing control unit and an output 
unit. The addressing control unit operates to receive at 
10 least one addressing control signal from the conference 

bridge. The output unit operates to receive at least one 
media signal from at least one participant within a media 
P conference and output the received media signal, via the 
il packet-based network, to at least one other participant 
J5 within the media conference based upon the addressing control 
y signal. in another embodiment of the fifth broad aspect, the 
il apparatus further includes a speech detection unit similar to 
that of the fourth broad aspect. 

S : y In y et further aspects, the present invention is a 

ISO method for controlling a media conference, a method for a 
p packet-based apparatus to operate within a media conference 
Q controlled by a conference bridge and a network incorporating 

a conference bridge according to one of the first three broad 

aspects. 

25 Other aspects and features of the present invention 

will become apparent to those ordinarily skilled in the art 
upon review of the following description of specific 
embodiments of the invention in conjunction with the 
accompanying figures. 



30 



BRIEF DESCRIPTION OF THE DRAWINGS 
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Embodiments of the present invention are described 
with reference to the following figures, in which: 

FIGURE 1A is a simplified block diagram 
illustrating a well-known circuit switched network with a 
5 voice conferencing capability; 

FIGCJRE IB is a simplified block diagram 
illustrating a well-known packet-based network with a voice 
conferencing capability; 

FIGURE 2 is a logical block diagram illustrating a 
10 well-known packet-based conference bridge implemented within 
the packet-based network of FIGURE IB; 

;jg FIGURE 3 is a simplified block diagram illustrating 

^ a well-known packet-based network coupled to a well-known PCM 
p telephone network with a voice conferencing capability; 

|* 5 FIGURE 4 is a logical block diagram illustrating a 

packet-based conference bridge according to a first 
|~ embodiment of the present invention; 

\n FIGURE 5 is a logical block diagram illustrating a 

O packet-based terminal according to the first embodiment of 
^20 the present invention; 

FIGURES 6A and 6B are signalling diagrams 
illustrating respective first and second sample operations of 
a packet-based network according to the first embodiment of 
the present invention; 



25 



FIGURE 7 is a logical block diagram illustrating a 
packet-based conference bridge according to a second 
embodiment of the present invention; 
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FIGURE 8 is a logical block diagram illustrating a 
packet-based terminal according to the second embodiment of 
the present invention ; 

FIGURE 9 is a functional block diagram illustrating 
5 the operations performed within the inputting apparatus and 
the decompression unit depicted within the packet-based 
terminal of FIGURE 8; 

FIGURE 10 is a signalling diagram illustrating a 
sample operation of a packet-based network according to the 
10 second embodiment of the present invention; 

FIGURE 11 is a logical block diagram illustrating a 

O packet-based conference bridge according to a third 

^ embodiment of the present invention; 

O FIGURE 12 is a logical block diagram illustrating a 

ji5 packet-based terminal according to the third embodiment of 

jji the present invention; and 

J FIGURE 13 is a signalling diagram illustrating a 

IU sample operation of a packet-based network according to the 

;4J third embodiment of the present invention, 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The present invention is directed to a number of 
different methods and apparatus that can be utilized within a 
packet-based voice communication system. Primarily, the 
embodiments of the present invention are directed to methods 
25 and apparatus used for voice conferences within packet-based 
communication networks, but this is not meant to limit the 
scope of the present invention. 
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One skilled in the art would understand that there 
are two essential sectors for the operations of a telephone 
session. These sectors include a control plane that performs 
administrative functions such as access approval and build- 
5 up/tear-down of telephone sessions and/or conference sessions 
and a media plane which performs the signal processing 
required on media (voice or video) streams such as format 
conversions and mixing operations. As described below, the 
present invention is applicable to modifications within the 
10 media plane which could be implemented with a variety of 

different control planes while remaining within the scope of 
the present invention. 

□ Embodiments of the present invention described 

^ herein below are directed to packet-based conference bridges 
mLS and packet-based apparatus coupled within a packet-based 
y network that enable media conferences between numerous 
U sources of media signals. These sources of media signals can 
u * be any device in which a person can output media data for 
M= transmission within the packet-based network. In some 
!)j20 embodiments, the packet-based apparatus are packet-based 
ifl terminals coupled together with the packet-based conference 
j-f bridge within a packet-based network, each of the packet- 
based terminals being a source for media signals for the 
other packet-based apparatus. 

25 In other embodiments, one or more of the packet- 

based apparatus are packet-based network interfaces which 
couple standard non-packet-based terminals, such as PCM or 
analog telephone terminals, to a packet-based network, each 
of the non-packet-based terminals being a source for media 

30 signals for the media conference. This situation is 

illustrated within FIGURE 3 in which a non-packet-based 
telephone network, in this case PCM telephone network 38, is 
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coupled to the packet-based network 28, via a packet-based 
network interface, in this case IP Gateway 36, As shown in 
FIGURE 3, a number of standard PCM telephone terminals 40 are 
coupled to the PCM telephone network 38, these PCM telephone 
5 terminals 40 possibly being considered as sources of media 
signals within embodiments of the present invention. 
Further, sources of media signals could be other devices that 
allow for the outputting of media data. 

In the following description, it should be 
10 understood that despite referring to the sources of media 
signals as packet-based terminals within the packet-based 
network throughout this document, such references could 
q alternatively be directed to another form of media signal 
;~ source. Further, although the packet-based apparatus 
y|5 described below are the packet-based terminals that also 
y serve as the source for media signals, it should be 
I,* understood that, alternatively, the packet-based apparatus 
could be packet-based network interfaces. Yet further, 
although the following description of the present invention 
| 10 is specific to voice data packets that contain compressed 
=fj voice signals and generally to voice conferencing, this 
^ should not limit the scope of the present invention as is 
described in further detail herein below. 

A first embodiment of the present invention, in 
25 which reduced processing is required within the packet-based 
conference bridge compared to well-known conference bridge 
designs, is now described with reference to FIGURES 4, 5, 6A 
and 6B. In this embodiment, speech indication signals are 
sent from the packet-based terminals 22,242,26 within the 
30 voice conference to the packet-based conference bridge 28 so 
that no speech detection operation needs to be performed 
within the conference bridge itself. In one implementation, 
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these speech indication signals simply indicate if a 
participant corresponding to a particular packet-based 
terminal is speaking or not. In other implementations , the 
speech indication signals indicates other parameters that 
5 could be utilized by a talker selection algorithm to select a 
set of the packet-based terminals as talkers. For example, 
in one implementation, the parameters within the speech 
indication signals correspond to the energy level of the 
speech associated with the participants at the particular 
10 packet-based terminals. 

FIGURE 4 is a logical block diagram illustrating a 
packet-based conference bridge according to this first 
Q embodiment of the present invention. This packet-based 
^ conference bridge replaces within FIGURE IB, the well-known 
\£5 packet-based conference bridge depicted within FIGURE 2. As 
y depicted in FIGURE 4, the packet-based conference bridge 28 
ju comprises the inputting apparatus 30 and the outputting 

apparatus 34 similar to that described above with reference 
Mr to FIGURE 2, The difference in the packet-based conference 
!|!0 bridge 28 of FIGURE 4 is the replacement of energy detection, 
,g talker selection and mixing block 32 with a talker selection 
jr? and mixing block 42. In this embodiment, the block 42 
comprises a talker selection block 44 that receives the 
speech indication signals from the packet-based terminals 
25 within the voice conference and a mixing block 4 6 that is 

coupled between the inputting and outputting blocks 30,34 and 
further is coupled to the talker selection block 44, 

In operation, the talker selection block 44 
receives the speech indication signals from the packet-based 
30 terminals within the voice conference, via the packet-based 
network 20, and performs a predefined talker selection 
algorithm. This talker selection algorithm could be similar 
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to that disclosed within U.S. patent application 08/987, 216, 
as incorporated by reference herein above, in which primary 
and secondary talkers are selected, though the present 
invention should not be limited to this implementation. 
5 During the selection of talkers by the talker selection block 
44, the technique used depends upon the particular design. 
For instance, in one implementation, talkers are selected 
based upon the order in which participants in the voice 
conference begin to speak. In this case, the talkers are 
10 selected as the first terminals which send speech indication 
signals to the talker selection block 44 indicating that a 
participant local to the particular packet-based terminal has 
begun to speak. In other designs, the energy level of the 
voice signals, as indicated within the speech indication 
■45 signals received from the packet -based terminals, is used by 
the talker selection block 44 to select the talkers. In yet 
O other designs, some of the talkers could be pre-selected 
* m while the talker selection block 44 uses the speech 

indication signals simply to select the other talker (s) 
]£0 within the voice conference. This could be applicable in 
|y cases that a monitor or prearranged speaker for the voice 
conference is always selected as a talker. 

Within the implementation of FIGURE 4, the mixing 
block 4 6 within FIGURE 4 receives the selection of talkers 

25 within the voice conference from the block 44, this selection 
of talkers comprising the identification of primary and 
secondary talkers in one implementation; performs a mixing 
operation on the voice signals corresponding to the talkers ; 
and forwards the mixed voice signals and the unmixed voice 

30 signals corresponding to the selected talkers to the 

outputting apparatus 34. In this case, the outputting 
apparatus 34 encapsulates and forwards the mixed voice 
signals to all of the packet-based terminals within the voice 
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conference except the terminals that have been selected as 
talkers. Further, the outputting apparatus 34 encapsulates 
the unmixed voice signals corresponding to the talkers within 
the voice conference and forwards the resulting voice data 
5 packets such that each of the talkers receives the voice 
signals corresponding to the other talkers within the voice 
conference. If there is only a single talker selected by the 
talker selection block 44 , the mixing block 4 6 acts simply as 
a selector of the voice signals corresponding to the sole 
10 talker, these voice signals being forwarded to the outputting 
apparatus 34, The outputting apparatus 34 encapsulates and 
forwards these selected voice signals to all the packet-based 
terminals within the voice conference except the terminal 
selected as the talker. 

U1L5 It should be noted that a procedure for de- 

^ selecting talkers is another operation within the talker 

selection block 44. In one embodiment, the de-selection of 
a packet-based terminal as a talker occurs if a speech 
H- indication signal received from the particular terminal 
pH20 indicates that a participant local to the terminal has 
m stopped speaking. In another embodiment, the de-selection of 
a packet-based terminal as a talker occurs if speech 
indication signals received from the particular terminal 
indicate the speech from a participant local to the terminal 
25 has decreased in energy. In yet another embodiment, the de- 
selection of a terminal as a talker is performed if a 
predetermined time interval is passed since the receipt of a 
speech indication signal that indicates that the particular 
terminal has a participant local to the terminal speaking. 

30 There are numerous alternative implementations for 

the packet-based conference bridge according to the first 
embodiment of the present invention. For one, modifications 
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within the conference bridge could be made similar to those 
described within U.S. patent application 09/475,047 entitled 
"APPARATUS AND METHOD FOR PACKET-BASED MEDIA COMMUNICATIONS" 
by Simard et al, filed on December 29, 1999 and incorporated 
5 herein by reference. As indicated within U.S. patent 

application 09/475,047, there are numerous implementations 
for the inputting apparatus 30, talker selection and mixing 
block 42 and the outputting apparatus 34 possible. For 
instance, the jitter buffer operation could be removed from 
10 the inputting apparatus 30 in some implementations. Further, 
in some implementations, the inputting apparatus 30 does not 
need to perform a decompression operation and the outputting 
apparatus 34 does not need to perform a compression operation 
^ on any voice signals corresponding to talkers which do not 
SJL5 require a mixing operation. This reduced transcoding can 
^ result in higher quality voice signals being broadcast to the 
Q participants of the voice conference as well as reduce the 
lZ latency of the voice data packets through the conference 
bridge 28, 

;jjj20 In yet further alternatives, the talker selection 

;j3 block 44 is coupled to the inputting apparatus 30 so as to 
]~:[ prevent the unnecessary processing of voice data packets that 

are received from packet-based terminals that are not 

selected as talkers. This can be accomplished with the 
25 present invention since the selection of the talkers within 

the voice conference is independent of the processing of the 

received voice data packets. 

It should be noted that although the blocks 
30,34,44,46 within FIGURE 4 are depicted as separate 
30 components, these blocks are meant to be logical 

representations of algorithms which are hereinafter referred 
to collectively as conference processing logic. Preferably, 
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some or all of the conference processing logic is essentially 
software algorithms operating within a single control 
component such as a DSP. In alternative embodiments , some or 
all of the conference processing logic is comprised of hard 
5 logic and/or discrete components. 

FIGURE 5 is a logical block diagram illustrating a 
packet-based terminal according to the first embodiment of 
the present invention. As depicted in FIGURE 5, the packet- 
based terminal comprises an inputting apparatus 50 that 
10 receives, via the packet-based network 20 , voice data packets 
from the packet-based conference bridge 28, the inputting 
apparatus 50 being coupled in series with a decompression 
p unit 52, a Digital-to-Analog (D/A) converter 54 and a speaker 
^ 56. Further, the packet-based terminal comprises a 
yi.5 microphone 58 coupled in series with an Analog-to-Digital 
^ converter 60, a compression unit 62 and an outputting 
^ apparatus 64. Yet further, as depicted in FIGURE 5, the 
— packet-based terminal according to the first embodiment of 
!«* the present invention comprises a speech detector 66 coupled 
yfO to the output of the A/D converter 60, 

q In operation, the inputting apparatus 50 receives 

O the voice data packets output from the packet-based 

conference bridge 28 and, along with the decompression unit 
52, performs similar operations as described above for the 
25 inputting apparatus 30 within FIGURES 2 and 4. That is, the 
inputting apparatus 50 combined with the decompression unit 
52 performs protocol stack, jitter buffer and decompression 
operations. The outputs from the decompression unit 52 are 
decompressed voice signals corresponding to the voice data 
30 packets received from the packet-based conference bridge 28, 
these outputs subsequently being input to the D/A converter 
54 which converts the voice signals into an analog format and 
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feeds the analog voice signals to the speaker 56. The 
speaker 56 broadcasts the voice signals such that a 
participant in the voice conference that is local to the 
packet-based terminal can hear the speech of the talkers 
5 within the voice conference. 

The microphone 58 operates to receive sound waves 
local to the microphone 58 and generate analog voice signals 
corresponding to the sound waves, these analog voice signals 
being input to the A/D converter 60. The A/D converter 60 
10 converts the analog voice signals to a digital format and 

forwards these voice signals to the compression unit 62. The 
compression unit 62 combined with the outputting apparatus 64 
perform similar operations to those described above for the 
=S outputting apparatus 34 within FIGURES 2 and 4. That is, the 
compression unit 62 combined with the outputting apparatus 64 
O perform a compression operation followed by a transmission 
ri operation. During the transmission operation, the outputting 
ifl apparatus 64 performs a protocol stack operation on the 

compressed voice signals, encapsulates the compressed voice 
[io signals within the packet-based format required for 
IS transmission on the packet-based network 20 and transmits 
O voice data packets comprising the compressed voice signals to 
the inputting apparatus 30 within the packet-based conference 
bridge 28, 

25 Both of the above described operations within the 

packet-based terminal of FIGURE 5 are performed within well- 
known packet-based terminals. The difference with the 
packet-based terminal according to the first embodiment of 
the present: invention as depicted in FIGURE 5 is the use of 

30 the speech detector 66 to receive the uncompressed digital 
voice signals from the A/D converter 60 and process these 
signals in order to generate speech indication signals that 
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are forwarded to the packet-based conference bridge 28 via 
the packet-based network 20. In one implementation, the 
speech detector 66 determines whether a participant local to 
the microphone is speaking or not by measuring the energy 
5 level of the voice signal being output from the A/D converter 
60. If the energy level is above a predetermined energy 
threshold, the speech detector 66 determines that a 
participant within the voice conference local to its 
particular packet-based terminal is speaking and, as a 
10 result, subsequently sends a speech indication signal 

indicating that a speaking participant is at the particular 
terminal. This speech indication signal is hereinafter 
referred to as a talking signal. If the energy level is not 
Q above the predetermined threshold, the speech detector 66 
tj sends a speech indication signal indicating that only 
[i! listeners are at the particular terminal. This speech 
Q indication signal is hereinafter referred to as a listening 
H signal. 

M= There are numerous alternative implementations for 

m the speech detector 66. For instance, in one implementation, 
ijjj the speech detector 66 sends the talking signal to the 
O packet-based conference bridge 28 when it first detects the 
energy level of the received voice signals have exceeded the 
predetermined energy threshold for a first predetermined time 
25 interval and sends the listening signal to the packet-based 
conference bridge 28 when it detects the energy level of the 
received voice signals are below the predetermined energy 
threshold for a second predetermined time interval. 



30 



In other embodiments, the speech indication signals 
are not talking and listening signals respectively. Instead, 
the speech indication signals correspond to specific 
parameters extracted from the received voice signals. For 
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instance, the speech indication signals in one implementation 
correspond to energy levels for the voice signals* In one 
example, these speech indication signals could be nil energy 
(0), a low energy level (El) or a high energy level (E2) . 
5 For this example, multiple energy thresholds could be used 
for comparison in order to classify the energy level of 
talking at the specific packet-based terminal. In another 
implementation, the extracted parameters from the voice 
signals could be the pitch of the voice signals- In this 
10 case, the pitch could either be directly forwarded to the 

talker selection block 44 or, alternatively, a determination 
could take place within the speech detector 66 on whether the 
D pitch indicates that there is speech or not- In the 
Q alternative case, a talking or listening signal as described 
SS above could be sent after processing the pitch values - 

2 It should be noted that, although not illustrated 

lil within FIGURE 5, an echo cancellation algorithm would need to 
y, be implemented in the packet-based terminal if a handsfree 
!U mode was functional within the terminal. This echo 
^20 cancellation algorithm would compensate the voice signals 
O received at the microphone 58 for the signals broadcast from 
the speaker 56. In one embodiment, the speech detector 66 
receives voice signals output from the decompression unit 52 
for echo cancellation reference signals. In this case, the 
25 echo cancellation reference signals are used to compensate 
the signals received from the A/D converter 60 so that the 
signals broadcast from the speaker 56 do not affect the 
analysis of the speech detection algorithm. In other 
implementations, the echo cancellation is performed at the 
30 conference bridge 28 with the talker selection block 44 

compensating speech indication signal parameters received 
from packet-based terminals based upon the calculated echo 
effect . 
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Although the speech detector 66 is illustrated in 
FIGURE 5 as receiving the uncompressed digital voice signals 
output from the A/D converter 60, it should be noted that 
this should not limit the scope of the present invention, 
5 For instance, in one implementation, the speech detector 66 
receives the analog voice signals from the microphone 58, In 
this case, the speech detector 66 must perform an analog 
speech detection algorithm to determine if there is speech 
within the signals. 

10 In other implementations, the speech detector 66 

receives the compressed voice signals from the compression 
unit 62 and/or the voice data packets from the outputting 
% apparatus 64. In these cases, speech detection operations as 
%j disclosed within U.S. patent application 09/475,047, 
lit 5 previously incorporated by reference, could be utilized. In 
Q one implementation, as disclosed within U.S. patent 
:!! application 09/475, 047, a Voice Activity Detection (VAD) 

operation is enabled at the packet-based terminal. In this 
embodiment, packets (and therefore compressed voice signals) 
n£0 that contain speech can be distinguished from packets that do 
:if not by the number of bytes contained within the packet. In 
O other words, the size of the compressed voice signal can 

determine whether it contains speech. For example, in the 
case that the G. 723.1 VoIP standard is utilized, voice data 
25 packets containing voice would contain a compressed voice 
signal of 24 bytes while voice data packets containing 
essentially silence would contain a compressed voice signal 
of 4 bytes. In another implementation as disclosed within 
U.S. patent application 09/475,047, the speech detector 66 
30 could determine if there is speech within a compressed voice 
signal by monitoring a pitch-related sector within the 
corresponding voice data packet. For example, within the 
G. 723.1 VoIP standard, the pitch sector is an 18-bit field 
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that contains pitch lag information for all subframes. In 
this particular implementation, the speech detector 66 could 
use the pitch sector to generate a pitch value for each 
subframe. If the pitch value is within a particular 
5 predetermined range, the corresponding compressed voice 
signal is said to contain speech. If not, the compressed 
voice signal is said to not contain speech. This 
predetermined range can be determined by experimentation or 
alternatively calculated mathematically. It is noted that 
10 many current VoIP standard codecs include pitch information 

as part of the transmitted packet and a similar comparison of 
pitch values with a predetermined range can be used with 
these standards • 

y Although the blocks within FIGURE 5 are depicted as 

iS separate components, these blocks are meant to be logical 
p representations of algorithms which are hereinafter referred 
i ?l to collectively as media signal processing logic. 

Preferably, some or all of the media signal processing logic 
III is essentially software algorithms operating within a single 
r|0 control component such as a DSP. In alternative embodiments, 
q some or all of the media signal processing logic is comprised 
P of hard logic and/or discrete components. 

There are a number of advantages of the packet- 
based network according to the first embodiment of the 

25 present invention . For one, there is a decrease in required 
processing power within the conference bridge 28 compared to 
well-known designs due to the removal of the energy detection 
operation from the conference bridge. This removal of the 
energy detection operation further, as described above, could 

30 lead to reduced need for decoding, decompression and 

transcoding operations and thus to increased quality voice 
signals with significantly reduced latency. 
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FIGURES 6A and 6B are signalling diagrams 
illustrating respective first and second sample operations of 
a packet-based network according to the first embodiment of 
the present invention. Within FIGURE 6A, a voice conference 
is being initiated between packet-based terminals A,B,C 
22,24,26 using conference bridge 28. In this case, the 
conference bridge 28 is designed as described herein above 
with reference to FIGURE 4 while each of the packet-based 
terminals 22,24,26 are designed as described herein above 
with reference to FIGURE 5. The talker selection algorithm 
within this example includes the selection of primary and 
secondary talkers based upon the order in which participants 
begin to speak as described above. 

As depicted within FIGURE 6A, initially within the 
signalling diagram, terminals A,B 22,24 transmit listening 
signals 70,72 to the conference bridge 28, these listening 
signals 70,72 indicating that no participant within the voice 
conference local to the terminals A, B 22,24 is speaking. 
Terminal C 26 is transmitting a talking signal 74 to the 
conference bridge 28 which indicates that a participant local 
to the terminal 26 is speaking. At this point, the 
conference bridge 28 selects the terminal C 26 as the primary 
talker (or lone talker at this point) and voice signals 
received from terminal C 2 6 are transmitted via the 
conference bridge 28 to the terminals A, B 22,24. Preferably, 
since no mixing is required within the conference bridge 
(since there is only a single talker) , no transcoding is 
performed within the conference bridge 28. 

Next within the signalling diagram of FIGURE 6A, 
the terminal B 24 transmits a talking signal 7 6 to the 
conference bridge 28, this talking signal 7 6 indicating that 
a participant within the voice conference local to the 
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terminal B 24 has begun to speak. At this point, the talker 
selection algorithm within the conference bridge 28 selects 
the terminal B 24 as the secondary talker in the voice 
conference. Now, voice signals received from terminals B and 
5 C 24,26 are mixed and transmitted to terminal A 22 while 
voice signals from terminals B and C 24,26 are further 
transmitted to terminals C and B 26,24 respectively. 

Subsequently, terminal A 22 sends a talking signal 
78 to the conference bridge 28, this talking signal 78 
10 indicating that a participant within the voice conference 
local to terminal A 78 has begun to speak- In this case, 
since primary and secondary talkers are already selected and 
p in this particular example only two talkers are to be 
;^ selected at a time, no change occurs within the conference 
fjf bridge 28 due to the receipt of talking signal 78, 
;rf Essentially, the participant at the terminal A 22 is being 
y ; muted within the voice conference. 

Next as depicted in FIGURE 6A, the terminal B 24 
jij transmits a listening signal 80 to the conference bridge 28, 
§b this listening signal 80 indicating that the participant 
n local to terminal B 24 has stopped speaking. At this point, 
O terminal B 24 is deselected as the secondary talker and, if 
the participant at terminal A 22 is still speaking, terminal 
A 22 would be selected as the secondary talker. Thus, the 
25 voice signals from terminal A 22 would subsequently be 
received at the other terminals 24,26 within the voice 
conference- Finally, terminal C 26 transmits a listening 
signal 82 to the conference bridge 28, this listening signal 
82 indicating that the participant local to the terminal C 26 
30 has stopped speaking. At this point, terminal A 22 would 
become the primary talker (or lone talker) . 
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FIGURE 6B depicts a signalling diagram similar to 
that of FIGURE 6A but with energy levels corresponding to the 
voice signals being transmitted as the speech indication 
signals rather than talking/listening signals- In this case, 
5 the energy levels of the voice signals are used to determine 
the primary and secondary talkers- As depicted in FIGURE 68/ 
initially, terminals A,B,C 22,24,26 transmit respective 
energy levels E (A) , E (B) , E (C) 84,86,88 of zero, zero and 
energy level 1 (El) to the conference bridge 28. At this 
10 point, the terminal C 26 is made the primary talker (and lone 
talker) . Subsequently, terminal B 24 transmits an adjusted 
energy level E(B) of energy level 2 (E2) to the conference 
bridge 28. In this case, since E2 is greater than El, the 
'i terminal B 24 becomes the primary talker and terminal C 26 
M becomes the secondary talker. Next, as depicted in FIGURE 
!U 6B, terminal A 22 sends an energy level E (A) 92 of E2 to the 
O conference bridge 28 which results in terminal A 22 replacing 
it terminal C 26 as the secondary talker. The participant at 

terminal C 26 would now be essentially muted from terminals 
TO A, B 22,24. Next, terminal B 24 sends an energy level E(B) 94 
Hi of zero to the conference bridge 28 indicating that the 
!ij participant local to terminal B 24 has stopped speaking. 
O Now f terminal A 22 which is still transmitting voice signals 
at energy level E2 becomes the primary talker and terminal C 
25 26 which is still transmitted voice signals at energy level 
El becomes the secondary talker. Finally within the 
signalling diagram of FIGURE 6B, the terminal C 26 sends an 
energy level E(C) of zero to the conference bridge 28. This 
resulting in the deselecting of terminal C 2 6 as the 
30 secondary talker and leaving terminal A 22 as the lone 
talker. 

It should be noted that the above descriptions of 
sample signalling diagrams within a network according to the 
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first embodiment of the present invention, should not be used 
to limit the scope of the present invention ♦ This signalling 
diagrams are included to illustrate two possible 
implementations of the present invention. 

5 A second embodiment of the present invention , in 

which the transmission of voice data packets is routed 
directly between packet-based terminals according to 
instructions from a packet-based conference bridge, is now 
described with reference to FIGURES 7, 8, 9 and 10. In this 

10 embodiment, addressing control signals are sent from the 
packet-based conference bridge 28 to the packet-based 
terminals within a voice conference that are selected as 

Q talkers within the conference bridge 28. In this embodiment, 
the addressing control signals indicate the packet-based 

111 network addresses (for example Internet Protocol (IP) 

addresses within IP networks) of the packet-based terminals 
that the talkers should be transmitting their voice data 
packets. With the direct transmission of the voice data 

M= packets to the other packet-based terminals within the voice 
conference, significant reductions in transcoding of the 

=o voice signals can be achieved along with reduced latency and 

:2j decreased processing requirements within the conference 

bridge. It is noted though, as described herein below, the 
implementation of the second embodiment of the present 

25 invention "can result in additional processing requirements 
within the individual packet-based terminals. 

FIGURE 7 is a logical block diagram illustrating a 
packet-based conference bridge according to a second 
embodiment of the present invention. This packet-based 
30 conference bridge replaces within FIGURE IB, the well-known 
packet-based conference bridge depicted within FIGURE 2. As 
depicted in FIGURE 7, the packet-based conference bridge 28 
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comprises the inputting apparatus 30 similar to that 
described above with reference to FIGURE 2. The difference- 
in the packet-based conference bridge 28 of FIGURE 7 is the 
removal of the energy detection, talker selection and mixing 
5 block 32 and the outputting apparatus 34 and the insertion of 
energy detection and talker selection block 100 coupled to 
the inputting apparatus 30. 

In operation, the energy detection and talker 
selection block 100 receives the voice signals corresponding 
10 to participants within a voice conference from the inputting 
apparatus 30, performs an energy detection operation on the 
received voice signals to determine which packet-based 

P terminals within the voice conference have participants local 
to the terminals speaking, and selects the talker (s) within 

jj|5 the voice conference based upon the results of the energy 

Jsf detection operation. Further, the block 100 within FIGURE 7 
operates to transmit addressing control signals to the 
packet-based terminals selected as talkers, the addressing 
control signals indicating the packet-based network addresses 

!30 of the other packet-based terminals within the voice 

=g conference. 

^ The energy detection operation performed within the 

energy detection and talker selection block 100 could be 
implemented in a number of different manners. For instance, 

25 it could include one of the speech detection algorithms 
described above for speech detector 66. As described 
previously, the operation of energy detection/speech 
detection algorithms are disclosed within U.S. patent 
application 09/475,047 as incorporated by reference 

30 previously. The talker selection operation performed within 
the block 100 could also be implemented in numerous different 
manners. Essentially, all of the possible implementations 
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previously described for the talker selection block 4 4 of 
FIGURE 4 could also apply to the talker selection operation ■ 
within block 100. In some embodiment for instance, the 
talker selection operation selects primary and secondary 
5 talkers based upon the order in which participants began to 
speak. 

As described above, the selection of the talkers 
within block 100 determines which packet-based terminals 
within the voice conference receive the addressing control 
10 signals, the addressing control signals giving the talkers 

permission to transmit their voice data packets to the other 
terminals within the voice conference- As well, the 
Q addressing control signals preferably forward the packet- 
;H based network addresses corresponding to the other packet- 
W based terminals that is needed to transmit the voice data 
S packets directly* In alternative implementations,, the 
M talker (s) do not require the packet-based network addresses 
5J * since they have them stored internally. In this case, the 
addressing control signals are simply permission signals to 
allow the talkers to transmit to the other packet-based 
^fi terminals within the voice conference . 

u As an option to the conference bridge according to 

the second embodiment of the present invention depicted in 
FIGURE 7, the mixing block 46 and outputting apparatus 34 

25 could be implemented in similar manner to that described 
above with reference to FIGURE 4. In this case, the 
conference bridge 28 operates to mix and transmit the voice 
signals corresponding to the talkers prior to the talker (s) 
receiving permission to directly transmit their voice signals 

30 to the other packet-based terminals within the voice signal. 
These components 4 6,34 would operate in a similar manner as 
those described above for FIGURE 4. As well, similar 
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alternatives to those discussed above would be possible with 
components 4 6,34. 

There are numerous alternative implementations for 
the packet-based conference bridge according to the second 
embodiment of the present invention. For one, similar to the 
first embodiment of the present invention; modifications 
within the conference bridge could be made similar to those 
described within U.S. patent application 09/475,047, 
previously incorporated by reference. As indicated within 
U.S. patent application 09/475,047, there are numerous 
implementations for the inputting apparatus 30 and energy 
detection and talker selection block 100 possible. 

It should be noted that although the blocks 
30,100,46,34 within FIGURE 7 are depicted as separate 
components, these blocks are meant to be logical 
representations of algorithms which are hereinafter referred 
to collectively as conference processing logic. Similar to 
the first embodiment of the packet-based conference bridge, 
preferably, some or all of the conference processing logic is 
essentially software algorithms operating within a single 
control component such as a DSP. In alternative embodiments, 
some or all of the conference processing logic is comprised 
of hard logic and/or discrete components. 

FIGURE 8 is a logical block diagram illustrating a 
packet-based terminal according to the second embodiment of 
the present invention. In this embodiment, the packet-based 
terminal comprises the same components as described 
previously with reference to FIGURE 5 but with the speech 
detector 66 removed, the outputting apparatus 64 replaced 
with outputting apparatus 106 and an addressing control unit 
108 added. 
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In the operation of the packet-based terminal of 
FIGURE 8, the outputting apparatus 106 transmits voice data 
signals corresponding to voice signals generated at the 
microphone 58 to the conference bridge 28. If the block 100 
5 within the conference bridge 28 selects the particular 

packet-based terminal as a talker, the block 100 transmits an 
addressing control signal to the addressing control unit 108 
within the terminal. This addressing control unit allows the 
packet-based terminal to transmit its voice data packets 
10 directly to the other terminals within the voice conference. 
The addressing control signals provide information needed to 
uniquely identify the other terminals that are to be 
transmitted to. In one particular example, the addressing 
D control signal could include IP addresses and/or port 
i"3 addresses . As discussed above, alternatively, the packet- 
yl based terminal has these addresses stored internally. 
S Subsequent to receiving an addressing control signal from the 

block 100 within the packet-based conference bridge 28, the 
" s addressing control unit 108 adjusts the outputting apparatus 
2© 106 such that the apparatus 106 further outputs its voice 

data packets to the packet-based terminals dictated by the 
=y conference bridge 28- In this operation, the outputting 
S apparatus 106 continues to transmit its voice data packets to 
the conference bridge 28 as well so that the energy detection 
25 and talker selection block 100 can adjust the selection of 
talkers as necessary- If the packet-based terminal is 
deselected as a talker, a de-selection control signal is sent 
to the addressing control unit 108, the reception of the de- 
selection control signal resulting in the discontinuation of 
30 the direct transmitting of the voice data packets to the 
other terminals within the voice conference. 

It should be recognized that modifications are 
required within the inputting apparatus 50 within the packet- 
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10 



30 



based terminal for the second embodiment of the present 
invention if more than one talker is allowed to be selected 
at a time. This is because, according to the second 
embodiment of the present invention, this would result in 
more than one set of voice data packets arriving at the 
inputting apparatus 50. In the case of primary and secondary 
talkers being selected by the block 100, it is possible that 
a particular terminal will receive voice data packets from 
two different talkers. In this situation, the packet-based 
terminal mix the primary and secondary voice signals to 
generate mixed voice signals. 



FIGURE 9 is a functional block diagram illustrating 
the modified operations performed within the inputting 
?: q apparatus 50 and the decompression unit 52 for the situation 
1| that primary and secondary talkers are transmitting voice 
q data packets to the packet-based terminal simultaneously. As 
g depicted in FIGURE 9, voice data packets from the primary and 
m secondary talkers are input to respective protocol stacks 120 
L whi °h are further coupled in series with respective jitter 
2 S buffers 122 and decompression blocks 124. The decompressed 
;|? outputs from the decompression blocks 124 are input to a 
□ mixer 126 that generates a mixed voice signal to be output to 
- the D/A converter 54. In operation, the protocol stacks 120 
remove the packet overhead from the received voice data 
25 packets and output voice signals in compressed format. The 
jitter buffers 72 operate to ensure that the voice signals 
are within the proper sequence (i.e. time ordering voice 
signals) and to buffer the voice signals to ensure smooth 
playback. The decompression blocks 124 decompress the voice 
signals such that they are preferably in PCM format and the 
mixer 126 operates to mix the decompressed voice signals 
together using well-known techniques. 



D8C-29-00 01:34pm From-$4B/F4Co, 613-232-8440 T-686 P.38/64 F-480 

-34- 

Although depicted as separate components within 
FIGURE 9, the pair of protocol stacks 120, the pair of jitter 
buffers 122 and the pair of decompression blocks 124 
preferably comprise a single protocol stack software 
5 algorithm, a single jitter buffer software algorithm and a 

single decompression software algorithm respectively, each of 
which capable of being run for each received packet « In this 
implementation, the software algorithms are possibly run in 
parallel as more than one voice data packet can be received 

10 at one time* It is noted that U.S. patent application 
09/475,047, incorporated by reference previously, discloses a 
packet-based terminal with an inputting apparatus similar to 
that described above with reference to FIGURE 9, 

=fj Although the blocks within FIGURE 8 are depicted as 

11 separate components, similar to the packet-based terminal of 
O FIGURE 5, these blocks are meant to be logical 

representations of algorithms which are hereinafter referred 

yl to collectively as media signal processing logic. 

: ^ Preferably, some or all of the media signal processing logic 

2® is essentially software algorithms operating within a single 

-If control component such as a DSP. In alternative embodiments, 

r| some or all of the media signal processing logic is comprised 

?a5? of hard logic and/or discrete components* 

There are a number of advantages of the packet- 
25 based network according to the second embodiment of the 

present invention. With the direct transmission of voice 
data packets from one packet-based terminal to other packet- 
based terminals, there is a significantly lighter load on the 
conference bridge which translates into higher capacity. 
30 Further, the conferencing configuration of the second 
embodiment reduces the concentration effect in which 
conference bridges are traditionally significant sources and 
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sinks of traffic within the network and redistributes the 
traffic more evenly within the packet-based network- Yet - 
further, the direct transmission of the voice data packets 
can reduce the need for transcoding and also decrease the 
5 overall latency. 

FIGURE 10 is a signalling diagram illustrating a 
sample operation of a packet-based network according to the 
second embodiment of the present invention. Within FIGURE 
10, a voice conference is being initiated between packet- 
10 based terminals A,B,C 22,24,26 using conference bridge 28. 
In this case, the conference bridge 28 is designed as 
described herein above with reference to FIGURE 7 while each 
of the packet-based terminals 22,24,26 are designed as 
■q described herein above with reference to FIGURE 8, The 
/|5 talker selection algorithm within this example includes the 
p selection of primary and secondary talkers based upon the 
!=f order in which participants begin to speak. 

As depicted within FIGURE 10, initially within the 
^f; signalling diagram, terminal A 22 transmits voice data 
jfjp packets 130 to the conference bridge 28 . These voice data 
;^ packets 130 are processed within the conference bridge 28 
q and, in this sample operation, terminal A 22 is selected as 
the primary talker (and lone talker) since the voice data 
packets 130 contain speech. In response to this talker 
25 selection, the conference bridge 28 sends an addressing 
control signal 132 to the terminal A 22, this addressing 
control signal 132 instructing the terminal A 22 to transmit 
its voice data packets directly to terminals B,C 24,26- As 
depicted in FIGURE 10, the terminal A 22 subsequently starts 
30 transmitting voice data packets 134 to the terminals B,C 
24,26. Although not illustrated in FIGURE 10, the 
transmitting of voice data packets from terminal A 22 to both 
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the conference bridge 28 and the other terminals B,C 24,26 
within the voice conference would continue until the 
conference bridge 28 instructed the terminal A 22 to stop, 
presumably due to the terminal A 22 being deselected as a 
5 talker. 

Next, within FIGURE 10, voice data packets 136 are 
transmitted from terminal B 24 to the conference bridge 23- 
These voice data packets 136, in the situation being depicted 
in FIGURE 10, result in the conference bridge 28 selecting 
10 the terminal B 24 as the secondary talker since the voice 

data packets 136 contain speech. In response to the talker 
selection, the conference bridge 28 instructs the terminal B 
24 to transmit its voice data packets directly to the 
■;i terminals A,c 22,26 with the use of an addressing control 
SI 5 signal 138. Once this addressing control signal 138 is 
Jl: received at the terminal B 24, the terminal B 24 proceeds to 
O transmit its voice data packets 14 0 to the other terminals 
HI A,C 22,26 within the voice conference (along with continuing 

to transmit the voice data packets to the conference bridge 
!||0 28 for analysis) * In this situation, terminal C 26 receives 
fy voice data packets from both terminals A and B 22,24 and a 
21 mixing operation would be required. 

As depicted in FIGURE 10, terminal C 26 
subsequently begins to transmit voice data packets 142 to the 

25 conference bridge 28. Assuming that the voice data packets 
being transmitted to the conference bridge 28 from the 
terminals A, B 22 f 24 still are deemed to contain speech, in 
this particular situation the terminal C 26 is not selected 
as a talker no matter if the voice data packets 142 contain 

30 speech or not. 
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A third embodiment of the present invention, in 
which the first and second embodiments of the present 
invention are combined, is now described with reference to 
FIGURES 11, 12 and 13, In this embodiment, speech indication 
5 signals are sent from the packet-based terminals within the 
voice conference to the packet-based conference bridge 28 and 
addressing control signals are sent from the conference 
bridge 28 to the packet-based terminals that are selected as 
talkers. This allows the packet -based network of the third 
10 embodiment of the present invention to gain the advantages of 
. both the first and second embodiments described above. 

In this third embodiment of the present invention, 
the packet-based conference bridge 28 is reduced to simply a 
talker selection block 150 as illustrated in FIGURE 11. The 
sJ15 talker selection block 150 operates in similar fashion to 

talker selection block 44 in terms of selecting talkers based 
upon the received speech indication signals while the block 
150 operates in similar fashion to block 100 in terms of 
sending addressing control signals based upon the selection 
Py20 of the talker (s). The talker selection block 150 could be 

implemented in numerous manners similar to the blocks 44,100 
described above with reference to FIGURES 4 and 7 
respectively. 

FIGURE 12 is a logical block diagram illustrating a 
25 packet-based terminal according to the third embodiment of 
the present invention. As depicted within FIGURE 12, the 
packet-based terminal comprises similar components to the 
packet-based terminal described above with reference to 
FIGURE 8 but additionally comprising the speech detector 66 
30 previously described for the first embodiment of the terminal 
with reference to FIGURE 5. Alternatives similar to those 
described above for the packet-based terminals of FIGURES 5 
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and 8 are also possible for the packet-based terminal 
according to the third embodiment of the present invention 
depicted in FIGURE 12. 

FIGURE 13 is a signalling diagram illustrating a 
5 sample operation of a packet-based network according to the 
third embodiment of the present invention. Within FIGURE 13, 
a voice conference is being initiated between packet-based 
terminals A,B,C 22,24,26 using conference bridge 28. In this 
case, the conference bridge 28 is designed as described 

10 herein above with reference to FIGURE 11 while each of the 
packet-based terminals 22,24,26 are designed as described 
herein above with reference to FIGURE 12. The talker 
selection algorithm within this example includes the 

J selection of primary and secondary talkers based upon the 
J5 order in which participants begin to speak. 

3 As depicted within FIGURE 13, initially within the 

* signalling diagram, terminals B,C 24,26 transmit listening 

signals 162,164 to the conference bridge 28, these listening 
j signals 162,164 indicating that no participant within the 
|0 voice conference local to the terminals 24,26 is speaking, 
j Terminal A 22 is transmitting a talking signal 160 to the 

11 conference bridge 28 which indicates that a participant local 
to the terminal 22 is speaking. At this point, the 
conference bridge 28 selects the terminal A 22 as the primary 

25 talker and an addressing control signal 166 is transmitted to 
terminal A 22. This addressing control signal 166 instructs 
the terminal A 22 to transmit its voice data packets 168 to 
the other terminals B,C 24,26 within the voice conference. 

Next within FIGURE 13, the terminal B 24 transmits 
30 a talking signal 170 to the conference bridge 28, this 

talking signal indicating that a participant within the voice 
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conference which is local to terminal 24 is speaking. At 
this point, the conference bridge 28 selects the terminal B 
24 as the secondary talker and transmits an addressing 
control signal 172 to the terminal B 24. Once the addressing 
5 control signal 172 is received at the terminal B 24, the 

terminal proceeds to transmit its voice data packets 174 to 
the other terminals A,C 22,26 within the voice conference. In 
this situation, terminal C 26 receives voice data packets 
from both terminals A and B 22,24 and a mixing operation 
10 would be required. 

As depicted in FIGURE 13, terminal C 26 
subsequently transmits a talking signal 17 6 to the conference 
P bridge 28, this talking signal 176 indicating that a 

participant within the voice conference local to terminal C 
26 has begun to speak. In this case, since primary and 
secondary talkers are already selected and in this particular 
example only two talkers are to be selected at a time, an 
addressing control signal is not sent to the terminal C 26 
and no permission is given for terminal C 26 to transmit its 
voice data packets to the other terminals A, B 22,24. 
[q Essentially, the participant at the terminal C 26 is being 
y muted within the voice conference. 

The packet -based terminals for embodiments as 
described herein above is not specific to any one packet- 

25 based voice communications standard (such as VoIP G.711, 

G.729, G.723, etc), as it can be modified such that it can be 
used for numerous different standards. In one alternative 
embodiment, the packet-based terminal is a multi-mode 
terminal that allows for voice conferences of a number of 

30 different standards to utilize the single packet-based 
terminal . 
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It should be noted that, although the network 
described above for embodiments of the present invention was 
specific to networks used for voice conferencing, this should 
not limit the scope of the present invention. For instance, 

5 the network of packet-based terminals could be used for 

point-to-point communications as well as voice conferencing. 
In the case of a point-to-point voice communication, both 
terminals would select the other participant as a lone 
talker. This allows a point-to-point conversation to be 

0 expanded to a larger voice conference with no major 
configuration modifications. 

In general, although the operation of the present 
invention was described herein above with use of the terms 
voice data packets and voice signals, these packets and 

5 signals can be referred to broadly as media data packets and 
media signals respectively. In this case, media data packets 
are any data packets that are transmitted via the media 
plane, these media data packets preferably being either audio 
or audio/video data packets. It is noted that use of the 

D term voice data packets above is specific to the described 
embodiments in which the audio signals are voice. Further, 
it should be understood that video data packets may 
incorporate audio data packets. 

Although the present invention herein above 
5 described has a single voice conference being established 
with the use of a network of packet-based apparatus and a 
conference bridge, it should be understood that in some 
embodiments the conference bridge it could be possible and/or 
one or more of the packet-based apparatus could be capable of 
) handling a plurality of voice conferences simultaneously. 
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Persons skilled in the art will appreciate that 
there are yet more alternative implementations and 
modifications possible for implementing the present 
invention, and that the above implementation is only an 
5 illustration of this embodiment of the invention. The scope 
of the invention, therefore, is only to be limited by the 
claims appended hereto. 



